central server
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Virginia (0.04)
- Asia > China > Hong Kong (0.04)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- (5 more...)
- North America > United States > Utah (0.05)
- North America > Canada (0.04)
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Singapore (0.04)
- North America > United States (0.04)
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)
Preserved central model for faster bidirectional compression in distributed settings
We develop a new approach to tackle communication constraints in a distributed learning problem with a central server. We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as algorithms using only uplink (from the local workers to the central server) compression. To obtain this improvement, we design MCM, an algorithm such that the downlink compression only impacts local models, while the global model is preserved. As a result, and contrary to previous works, the gradients on local servers are computed on perturbed models. Consequently, convergence proofs are more challenging and require a precise control of this perturbation. To ensure it, MCM additionally combines model compression with a memory mechanism. This analysis opens new doors, e.g.
Communication-Efficient Topologies for Decentralized Learning with O(1) Consensus Rate
Decentralized optimization is an emerging paradigm in distributed learning in which agents achieve network-wide solutions by peer-to-peer communication without the central server. Since communication tends to be slower than computation, when each agent communicates with only a few neighboring agents per iteration, they can complete iterations faster than with more agents or a central server. However, the total number of iterations to reach a network-wide solution is affected by the speed at which the information of the agents is ``mixed'' by communication. We found that popular communication topologies either have large degrees (such as stars and complete graphs) or are ineffective at mixing information (such as rings and grids). To address this problem, we propose a new family of topologies, EquiTopo, which has an (almost) constant degree and network-size-independent consensus rate which is used to measure the mixing efficiency.In the proposed family, EquiStatic has a degree of $\Theta(\ln(n))$, where $n$ is the network size, and a series of time-varying one-peer topologies, EquiDyn, has a constant degree of 1. We generate EquiDyn through a certain random sampling procedure. Both of them achieve $n$-independent consensus rate. We apply them to decentralized SGD and decentralized gradient tracking and obtain faster communication and better convergence, both theoretically and empirically.